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^N ' Abstract 

^^ ' We prove two complexity results about the H-index concerned with 

the Google scholar merge operation on one's scientific articles. The results 

show that, although it is hard to merge one's articles in an optimal way, 

r ■) ' it is easy to merge them in such a way that one's H-index increases. This 

J- \ \ suggests the need for an alternative scientific performance measure that 

'*^ ■ is resistant to this type of manipulation. 

C/3 

o . 

1 Introduction 

<~^- ....... 

^ , The H-index was introduced by the physicist J.E. Hirsch in [3] to 'quantify an 

t^^ ' individual's scientific research output'. Recall that it is defined as the largest x 

such that one's x most cited paper is cited at least x times. (An aside: Hirsch's 
original definition was ambiguous as pointed out in [4], where the current defi- 
nition is proposed.) Its introduction led to an impressive literature. According 
to Google scholar on 25th of February 2013 this paper was cited 2816 times. To 
mention just one example, [5] provided its axiomatic definition. 

H-index started to be used as a universal measure to assess and compare 
researchers in a given discipline. Hirsch suggested in his paper '(with large 
error bars) that for faculty at major research universities, /i sa 12 might be a 
typical value for advancement to tenure (associate professor) and that ft, w 18 
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M ' might be a typical value for advancement to full professor'. 

■ - - In fact, computer scientists seem to cite each 

other much more often. Jens Palsberg maintains at 

|http : //www .cs.ucla.edu/ ~pals berg/h- number . html| a list of 
computer scientists with H-indcx 40 or higher (a value corresponding in 
Hirsch's article to Nobel prize winners). The list has more than 600 names and 
is based on the output generated by Google scholar. 

Several people made obvious observations that the H-index can be boosted 
by such simple measures as adding your name to the articles written by members 
of your group, splitting a long article into a couple of shorter ones, by citing one's 
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and each other's work, etc. For example, [T] study the problem of manipulability 
of the H-indcx by means of self-citations. 

This brings us to the subject of this note. Google scholar allows one to 
perform some operations on the listed articles, notably merge that allows one to 
combine two versions of an article even if they have different titles. By means 
of the merge operation you can obviously improve your H-index. Suppose for 
instance that your H-index is 20. Then you can increase it by merging two 
articles that are cited each 11 times. 

This suggests two natural problems, where in each case we refer to the 
improvement of the H-index by means of the merge operations. 

• Is it possible to improve your H-index? 

• Given a number fc, determine whether your H-index can be improved to 
at least k. 



2 Two results 

To deal with these questions we introduce first some notation. A researcher's 
output is represented as a multiset of natural numbers, each number represent- 
ing a publication and its value representing the number of its citations. For 
example the multiset {1, 1, 2, 3, 4, 4, 5, 5, 5} represents an output consisting of 9 
publications with the corresponding H-index 4. Given a multiset T of numbers 
we abbreviate J2x£T a; to ^ T. So ^ T is the number of citations resulting 
from the merge of the publications in T into one. 

To deal with the outcomes of merges we need to consider partitions of such 
multisets. 

Fix a finite multiset S of numbers from N>o. We denote by S the singletons 
partition {{x} \ x G S}. Given a partition T of S, we define 

v{T) = max{|r'| I r' c r,vr e r' : ^T > \t'\}. 

In words, call a subset T' of the partition T good if each element T of 7"' 
after merge into a single publication yields at least |T'| citations. So if one 
allows the merge operation, then a good partition T' ensures that the H-index 
can be set to at least \T'\. Then v{T) is the cardinality of the largest good 
subset of T, hence v{T) is the largest H-index one can obtain by means of the 
merge operation, while v{S) is the H-index corresponding to the input multiset 
S. To put it more directly, 

v{S) =max{|r| \TCS, Vx e T x > \T\}, 

where we refer to the submultisets. 

We call a partition 5 of 5 an improving partition if v{S) > v{S). We can 
now formalize the above two problems as follows, given as input a finite multiset 
S of numbers in N>o. 



H-index improvement problem Does there exist an improving partition? 
If yes, find it. 

H-index achievability problem Given a number k, does there exist a 
partition T of S, such that u(T) > k? 

In the appendix we present the proofs of the following two results. 

Theorem 1. The H-index improvement problem can be solved in polynomial 
time. 

Theorem 2. The H-index achievability problem is strongly NP -complete. 

In particular, the problem of computing the maximal H-index achievable 
through the merge operation is strongly NP-complete. 

From the viewpoint of manipulability, Theorem [T] is bad news. Ideally, we 
would like to have a performance measure that is computationally difficult to 
manipulate. One can see a parallel with the search for voting methods that 
are difficult to manipulate, see, e.g. [S]. Our conclusion is that the H-index is 
not the last word in the ongoing quest to find a credible way to quantify one's 
scientific output. 

Appendix 

In what follows, we assume that a multiset is represented as a list of possibly 
duplicate numbers. A different way of representing a multiset would be the more 
compact one, where we list only the distinct numbers that appear in the multiset, 
along with their respective multiplicity. We consider the latter representation 
to be unnatural, given the context in which we study this problem. 

Proof of Theorem [Ij Let S be the given multiset. Let S' be the smallest 
submultiset of S such that v{S) = v{S'). For instance, if 5* = {5,4,3,3,3,2}, 
then S' = {5,4,3} and if S* = {5,3,3,3,3,2}, then S" = {5,3,3}. In both 
cases v{S) — 3. Call a number x (£ S' supercritical if x > v{S) and critical if 
X = v{S). Let C+ be the multiset of all supercritical numbers in S" and C the 
multiset of all critical numbers in S". Note that C and C+ partition S' and 
that v{S) = \C+\ -\- \C\. Furthermore, let L denote the multiset of \C\ smallest 
numbers in S. 

For instance, if 5 = {5,4,3,3,3,2}, then C = {3} and L = {2}, and if 
S = {5, 3, 3, 3, 3, 2}, then C = {3, 3} and L = {3, 2}. 

Note that below, we treat duplicate numbers in S as having "separate iden- 
tities" , so that for two numbers x,y d S that are equal in magnitude, it may 
hold that X € C but y ^ C oi x G L but y ^ L. We believe that this slight 
informality and definitional abuse will cause no confusion to the reader. 

We first establish the following characterization result. 



Lemma 1. There exists an improving partition of S iff LnC = and'^ S\{CU 
C+UL)>\C\ + \C+\. 

Proof. Suppose there exists an improving partition S of S. 

We can assume without loss of generahty that the following properties then 
hold: 

1. Each supercritical number in S appears in a singleton set in S. These are 
the only singleton sets in S. 

Indeed, if a supercritical number x G S appears in a non-singleton set 
T £ iS, then take the partition T oi S obtained from S by splitting T into 
singletons. Because S is an improving partition, there are at least v{S) 
muhisets T' G S\{T} such that Y,T' > v{S). AU muUisets of S\{T} are 
in T. Also the number x is in a singleton set of T and x > v{S). Therefore, 
there are in T at least v{S) + l multisets T' such that ^ T' > v{S). Hence, 
T is an improving partition. 

After we have repeatedly performed the above splitting steps we obtain an 
improving partition S' such that each supercritical number x G S appears 
in a singleton set in S' . 

Since 

viS')>viS)^\C+\ + \C\>\C+l 

there exists in S' a non-singleton multiset T £ S that contains only non- 
supercritical numbers. Merging with it all singleton sets that contain a 
non-supercritical number yields the desired improving partition. 

2. i is disjoint from C. 

By Property 1, the supercritical numbers form singleton sets in S, and each 
remaining multiset has cardinality at least 2. If L were not disjoint from 
C, then we would have IS'I < \C+\ + \L\ + \C\, so \S\C+\ < |L| + |C| = 2|C|, 
hence the number £ of non-singleton multisets in S would be at most \C\. 
This yields a contradiction, since we would then have v{S) < |C+| + £ < 
\C+\ + \C\^viS). 

3. In iS, every critical number is in a set of cardinality 2. 

Indeed, by Property 1, critical numbers do not appear in singleton sets. 
Further, if a critical number x £ S appears in a multiset T £ 5 of car- 
dinality exceeding 2, then we can split T in any way so that x is put in 
a multiset T' of cardinality 2. It then holds that ^ T' > v{S), so the 
resulting partition remains an improving partition. 

4. There is a bijection tt : C — > i such that {x,tt{x)} G S (i.e., C is 
"matched" with L in S). 

Indeed, by Property 3, every critical number is in a set of cardinality 2. 
Now, let a; be a critical number and let {x, y} € S he the multiset of 
cardinality 2 that contains x. If y is not in L, then \C\ = \L\ implies 



that there is a number y' € L that occurs in a multiset T in 5 that does 
not contain a critical number. Because y' < y, the operation of swapping 
y' and y in S does not decrease the number of multisets that sum to at 
least v{S) + 1. So the partition that results after this swap remains an 
improving partition. 

We have v{S) > v{S) = \C-^\ + \C\, so by Properties 1,2, and 4, there is a 
multiset T G S not intersecting C+, C, and L, such that ^T > v{S). Hence 
J2 S\iC UC+UL)>J2T> v{S) = \C\ + \C+\. Wc conclude that if there is an 
improving partition, then LnC ^ and Y, S\{C UC+UL) > \C\ + \C+\. 

Conversely, if L n C = and X) S\{C UC+UL) > \C\ + \C+\, then there is 
an improving partition. It consists of 



• 



• 



the singletons, each containing an element of C-f, 

the sets of cardinality 2, each containing a pair of elements from C and L, 

• the multiset S\{C UC+UL). 

D 

The proof of Theorem[l]is now immediate. It is straightforward to compute 
C-I-, C and L in polynomial time. Using the above lemma we can therefore 
determine in polynomial time whether an improving partition exists, and find 
one in polynomial time if it does. D 

Proof of Theorem [2l The problem is clearly in NP, so the proof will focus on 
establishing N P-hardness. Wc do this by means of a polynomial time reduction 
from a strongly N P-complete problem. The reduction is from the 3-PARTITION 
problem. In the 3-PARTITION problem, we are given a multiset M of 3m 
positive integers, such that ^ M = mb for some b G N. Wc have to decide 
whether it is possible to partition this set into m submultisets, such that the 
sum of the numbers in each submultiset is exactly 6. 

Garey and Johnson [5| prove that the 3-PARTITION problem is strongly 
N P-complete, even under the assumption that M is represented as above (i.e., 
non-concisely). This means that the 3-PARTITION problem is N P-complete 
even when b is bounded by some polynomial in m. Denote this polynomial by 
p{m). From now on, with the SPECIAL 3-PARTITION problem we will mean 
the special case of the problem where b is bounded by p{m). 

Before proceeding, one note is in order. In the original definition of the 3- 
PARTITION problem, the additional requirement is imposed that all sets in the 
partition are of cardinality 3 (and this is also where the name of the problem 
originates from). For convenience, we do not impose this requirement here. 
The reason it is not necessary to impose this requirement is because in [5] , it is 
shown that strong N P-hardness holds even when all numbers in the multiset are 
strictly between 5/2 and 6/4. This enforces that all sets in the partition will be 
of cardinality 3. Without the cardinality constraint, the problem thus becomes 
more general, and is automatically strongly NP-hard. 



Given a SPECIAL 3-PARTITION instance (S", m, b), we reduce it to an H- 
index manipulation problem instance {S, k) as follows. First, obtain S" from S' 
by adding m to each number in S". Note that (5"', m, fc), where k = b + 3m, is 
a YES-instancc of 3-PARTITION if and only if (S",to,6) is a YES-instance of 
SPECIAL 3-PARTITION. Note also that A; -m = b + 2m>0. Next, obtain the 
multiset S from S" by adding k — m copies of k to S' . This takes polynomial 
time, as k is bounded by p(m) + 3m. 

We now show that {S, k) is a YES-instance of the H-index manipulation 
problem if and only if (S"', m, fc) is a YES-instance of 3-PARTITION. 

If (5"', m, fc) is a YES-instance of 3-PARTITION, then let T be a certificate 
for that, so T is a partition of S" into m multisets such that the sum of the 
numbers in each multiset is k. Then by adding to T exactly k — m copies of 
the set {k}, we obtain a certificate that {S, k) is a YES-instance of the H-index 
achievability problem, because k = k. 

Conversely, if (S*, k) is a YES-instance of the H-indcx achievability problem, 
then let 7" be a certificate for that. We can assume without loss of generality 
that the partition T contains exactly k — m copies of the set {k}. Indeed, 
otherwise we can split each non-singleton set in T that contains a copy of k into 
singleton sets. This will result in a desired certificate. 

By removing all singleton sets {k} from T we obtain a partition T' of S" . 
By the choice of {S, k) this new partition T' contains m multisets, each of 
which sums up to k. T does not contain any additional multiset besides these 
m multisets, as then we would have ^ S" > mk, which is not the case by 
construction. Therefore, T' is a certificate that {S" ,m,k) is a YES-instance of 
3-PARTITION. D 
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